To evaluate or measure ‘something’ one has to understand it
first. So, to build the index the first thing to do is to answer the
following: what is success in football?*
As in any other sport, the simplest definition of success is to win. But
due to how football is played and ruled, success could lose its
simplicity.
Depending on who’s asked, success could mean winning at all costs or
winning only in a certain fashion/style. So, to avoid ambiguity, the
index was crafted thinking as it was implied above: success =
win**
ggplot(ip_df, aes(x = achieved_stage_ip, y = s_points_ip)) +
geom_col(fill = "#3399CC", alpha = 0.6) +
scale_y_continuous(limits = c(-0.04,1.3), breaks = s_points_ip) +
labs(title = "How WCS_I works",
subtitle = "WCS_I applied to current world champions: Argentina (a.k.a 'La Scaloneta')",
x = '',
y = "WCS_I") +
theme(plot.title = element_text(size = 18, face = 'bold', hjust = 0),
plot.subtitle = element_text(size = 14, face = 'bold', hjust = 0),
axis.text = element_text(size = 14), axis.ticks.y = element_blank(),
panel.grid.major.y = element_line(color = 'gray', linewidth = .25,
linetype = "dashed"),
panel.background = element_blank()) +
geom_segment(data = data.frame(x = 1, xend = 5.8, y = -.04, yend = -.04),
aes(x = x, xend = xend, y = y, yend = yend),
color = "#3399CC", alpha = 0.85,
linewidth = 4, linejoin = c('mitre'),
arrow = arrow(angle = 10, length = unit(0.4,'inches'))) +
draw_image(img_wc_1, x = 5.5, y = 1, height = .25, scale = 1) +
geom_label(data = arg_expl_summ_df[1,],
aes(x = 1,
y = y_coord -0.18, label = summ),
fill = "#3399CC", alpha = 0.1, size = 3) +
geom_label(data = arg_expl_summ_df[-1,],
aes(x = c(1,2,3,4,5),
y = y_coord -0.18, label = summ),
fill = "#3399CC", alpha = 0.6, size = 3) +
annotation_custom(img_saint_1 %>% rasterGrob(),
xmin=5.2, xmax=6.2, ymin=1.35, ymax=1.55) +
coord_cartesian(clip = "off")
p <- list(projection = list(type = 'natural earth'),
showcountries = TRUE, countrycolor = "black",
showland = TRUE, landcolor = "white")
plot_geo(wc_map_wcs_i, locationmode = 'code', locations = ~ISO_codes) %>%
add_trace(z = ~sum_s_points, text = text_map, hoverinfo = "text",
color = ~sum_s_points, colors = 'Blues',
marker = list(line = list(color = "green", width = .5))) %>%
add_annotations(text = paste("~ 5 countries where excluided from the map: England, Scotland, Serbia & Montenegro, Yugoslavia, and Wales.", "* British could not be set appart, and thus were aggregated as United Kingdom in the map.","* Serbia & Montenegro, and Yugoslavia don't exist anymore.", sep = "\n"), showarrow = F, x = .5, y = -0.1) %>%
colorbar(title = "Index sum", x = 1, y = .8) %>%
layout(title = paste("<b>WCS_I around the globe<br>(1998 - 2022)</b>"), geo = p,
images = list(list(source = raster2uri(pen),
xref = "paper", yref = "paper", x= .85, y= 1.06,
sizex = .17, sizey = .14)))
WCS_I is a simple alternative to understanding how national teams performed in the World Cup for the last couple of decades or so. What makes it simple is its only premise: to be successful one should win.
Not everybody feels comfortable with such a premise because it lacks a ‘how to win’ component, in other words: It doesn’t consider the style of play or tactics involved in the result. Of course, the index creator empathizes with the sentiment, but including this component has the following issues…
1. How could anybody weigh a style of play over
another? If at the end of the day, every known style of play delivered
results, even in modern football.
2. There is no data available to capture the style of
play of every team in the sample. And even if it was, only figuring out
the preferred tactics of every manager and if they can put it to work on
the pitch is a lot of work.
The visual above shows plausible understandings of success and how the component mentioned before (how to win) gets involve in the question ‘What is success in football?’.
In case the previous map isn’t realistic enough, here’s another one
with a higher resolution.
Higher resolution could delay manipulation and navigation on the
map.
wc_map_wcs_i %>% select(., -c(n.x, n.y)) %>% datatable(rownames = F,
extensions = 'Buttons',
options = list(dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'),
pageLength = 4, lengthMenu = c(1, 2, 4)))
wc_map %>% select(., -c(n.x, n.y)) %>% datatable(rownames = F,
extensions = 'Buttons',
options = list(dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'),
pageLength = 4, lengthMenu = c(4, 8, 12)))
wc_all_labeled_long %>% select(., -n) %>% datatable(rownames = F,
extensions = 'Buttons',
options = list(dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'),
pageLength = 4, lengthMenu = c(4, 8, 12)))
wc_all_labeled_wide_sum %>% select(., -n) %>% datatable(rownames = F,
extensions = 'Buttons',
options = list(dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'),
pageLength = 4, lengthMenu = c(4, 8, 12)))
reactable(img_cred, pagination = FALSE, highlight = TRUE,
height = 175,
columns = list(
imagery_tag = colDef(name = "Image label"),
imagery_credit = colDef(name = "Credits (w/link)",
html = TRUE,
cell = function(value, index) {
sprintf('<a href="%s" target="_blank">%s</a>', img_cred$imagery_link[index], value)
}),
imagery_link = colDef(show = F)
)
)
# Unify: 1998 ~ 2018. ---------
# 2022 will be incluided as soon as fbref updates it's site...
# wc_year <- seq(1998,2022,4) %>% as.character() %>% paste(., collapse = "|")
LINK <- "https://fbref.com/en/comps/"
read_html(LINK) -> wc_access
wc_year <- seq(1998,2018,4) %>% as.character(.) %>%
paste(., collapse = "|") %>% paste0(., "|1/World-Cup-Stats")
wc_links <- wc_access %>%
html_nodes("div table#comps_intl_fa_nonqualifier_senior tr.gender-m th a") %>%
html_attr("href") %>% paste0("https://fbref.com", . ) %>%
.[str_detect(. , "World-Cup", negate = F)] %>%
read_html() %>% html_nodes("div th a") %>% html_attr("href") %>%
paste0("https://fbref.com", . ) %>% .[str_detect(. , wc_year, negate = F)]
# Got 3 groups... R16, Champions (a.k.a Winners), and group stage
# r16
wc_links_r16 <- lapply(wc_links, function(i){
read_html(i) %>% html_nodes("div.matchup-team a") %>% html_text() %>% .[17:32] %>%
as.data.frame()
})
# winners
wc_links_winners <- lapply(wc_links, function(i){
read_html(i) %>% html_nodes("div.match-summary div.matchup-team") %>%
.[str_detect(. , "winner", negate = F)] %>% html_nodes("a") %>% html_text() %>%
.[-2] %>% as.data.frame()
})
# gs
wc_links_gs <- lapply(wc_links, function(i){
read_html(i) %>% html_nodes("div div.section_wrapper table tbody tr td a") %>%
html_text() %>% as.data.frame()
}) #YEP!
# 2nd web scraping for ISO codes (needed to map countries) -------
LINK.2 <- "https://countrycode.org/"
read_html(LINK.2) -> iso.iso
iso.1 <- iso.iso %>%
html_nodes("div table tbody tr a") %>% html_text()
iso.2 <- iso.iso %>%
html_nodes("div table tbody tr td") %>% .[seq(3,2160,6)] %>%
html_text() %>% substring(., 6, 8)
# bind_cols(iso.1, iso.2)
# iso.1 %>% length()/2
# iso.2 %>% length()
# as.data.frame(iso.2) %>% count(iso.2) # Found duplicates
# iso.1 %>% unique() %>% length()
# iso.2 %>% unique() %>% length()
# detect and solve duplication issue.
iso.1 <- iso.1 %>% unique()
iso.2 <- iso.2 %>% unique() # done...
ISO_tab <- bind_cols("countries" = iso.1, "ISO_codes" = iso.2)
IOS_rebels <- ISO_tab %>%
anti_join(., st_transform(ne_countries(scale = 'medium', type = 'countries',
returnclass = 'sf')) %>%
select("name", "continent"), by = c("countries" = "name" )) %>%
.[1] %>% unlist() %>% as.character()
reb_corrected_ISO_list <- c("Antigua and Barb.", "Bosnia and Herz.", "Invalid",
"British Virgin Is.", "Cayman Is.",
"Central African Rep.", "Invalid", "Invalid", "Cook Is.",
"Curaçao", "Czech Rep.", "Dem. Rep. Congo", "Dominican Rep.",
"Timor-Leste", "Eq. Guinea", "Falkland Is.", "Faeroe Is.",
"Fr. Polynesia", "Invalid", "Côte d'Ivoire", "Lao PDR", "Macao",
"Marshall Is.", "Invalid", "Invalid", "Dem. Rep. Korea",
"N. Mariana Is.", "Invalid", "Congo", "Invalid", "St-Barthélemy",
"St. Kitts and Nevis", "St-Martin", "St. Pierre and Miquelon",
"St. Vin. and Gren.", "São Tomé and Principe", "Solomon Is.",
"Korea", "S. Sudan", "Invalid", "Invalid", "Turks and Caicos Is.",
"Invalid", "U.S. Virgin Is.", "Wallis and Futuna Is.", "W. Sahara")
# length(IOS_rebels) == length(reb_corrected_ISO_list) logical check
ISO_tab[1] <- sapply(ISO_tab[1], function(o)
replace(o, o %in% IOS_rebels, reb_corrected_ISO_list))